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ABSTRACT 

We  consider  the  problem  of  20  questions  with  noise  for  collabora¬ 
tive  players  under  the  minimum  entropy  criterion  [1]  in  the  setting 
of  stochastic  search,  with  application  to  target  localization.  First, 
assuming  conditionally  independent  collaborators,  we  characterize 
the  structure  of  the  optimal  policy  for  constructing  the  sequence  of 
questions.  This  generalizes  the  single  player  probabilistic  bisection 
method  [1,2]  for  stochastic  search  problems.  Second,  we  prove  a 
separation  theorem  showing  that  optimal  joint  queries  achieve  the 
same  performance  as  a  greedy  sequential  scheme.  Third,  we  estab¬ 
lish  convergence  rates  of  the  mean- square  error  (MSE).  Fourth,  we 
derive  upper  bounds  on  the  MSE  of  the  sequential  scheme.  This 
framework  provides  a  mathematical  model  for  incorporating  a  hu¬ 
man  in  the  loop  for  active  machine  learning  systems. 

Index  Terms —  optimal  query  selection,  human-machine  inter¬ 
action,  target  localization,  convergence  rate,  minimum  entropy. 

1.  INTRODUCTION 

This  paper  addresses  a  problem  related  to  maximizing  the  value  of 
adding  a  human-in-the-loop  to  an  autonomous  learning  machine, 
e.g.,  an  automated  target  recognition  (ATR)  sensor.  In  the  ATR  set¬ 
ting  the  objective  of  the  human-machine-interaction  is  to  collabo¬ 
rate  on  estimating  an  unknown  target  location,  where  the  human  is 
repeatedly  queried  about  target  location  in  order  to  improve  ATR 
performance.  We  propose  a  20  questions  framework  for  studying 
the  value  of  including  the  human-in-the-loop  and  optimizing  the  se¬ 
quence  of  queries. 

Motivated  by  the  approach  of  Jedynak  et  al  [1],  which  was  re¬ 
stricted  to  the  single  player  case,  we  model  the  human-machine  in¬ 
teraction  as  a  noisy  collaborative  20  questions  game.  In  this  frame¬ 
work  a  controller  sequentially  selects  a  pair  of  questions  about  tar¬ 
get  location  and  uses  the  noisy  responses  of  the  human  and  the  ma¬ 
chine  to  formulate  the  next  pair  of  questions.  Under  the  minimum 
expected  entropy  criterion,  we  show  that  even  under  independence 
between  collaborative  players,  jointly  optimal  policies  require  over¬ 
lapping  non-identical  queries.  We  prove  that  the  expected  entropy 
reduction  for  the  optimal  joint  design  is  the  same  as  that  of  a  greedy 
sequential  design.  The  greedy  sequential  design  consists  of  a  se¬ 
quence  of  bisections.  This  yields  a  low  complexity  implementation 
that  is  guaranteed  to  have  the  same  performance  as  the  optimal  query 
controller. 

As  in  Jamieson  et  al  [3],  we  use  a  simple  noisy  query-response 
model  with  different  reliability  functions  for  the  machine  and  the 
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human  (called  derivative-free  optimizers  (DFO)  in  [3]).  Under  this 
model  we  specify  the  optimal  query  policy,  establish  a  separation 
theorem,  and  obtain  MSE  bounds  and  convergence  rates.  Our  model 
predicts  that  the  value  of  including  the  human-in-the-loop,  as  mea¬ 
sured  by  the  MSE  human  gain  ratio  (HGR),  initially  increases  when 
localization  errors  are  large,  and  then  slowly  decreases  over  time  as 
the  location  errors  go  below  the  human’s  fine  resolution  capability. 

2.  NOISY  20  QUESTIONS  WITH  COLLABORATIVE 
PLAYERS:  ENTROPY  LOSS 

Assume  that  there  is  a  target  with  unknown  state  X*  £  X  C  Rd. 

Our  focus  in  this  paper  is  the  case  where  the  target  state  is  spatial 
location,  i.e.,  in  d  =  2  or  3  dimensions.  However,  our  results  are 
applicable  to  higher  dimensions  also,  e.g.,  where  X *  is  a  kinematic 
state  or  some  other  multi-dimensional  target  feature.  Starting  with  a 
prior  distribution  po(x)  on  X *,  the  aim  is  to  find  an  optimal  policy 
for  querying  a  machine  (hereafter  referred  to  as  player  1),  with  the 
additional  help  of  humans.  The  policy’s  objective  is  to  minimize  the 
expected  Shannon  entropy  of  the  posterior  density  pn  ( x )  of  target 
location  after  n  questions. 

There  are  M  collaborating  players  that  can  be  asked  questions  at 
each  time  instant.  The  objective  of  the  players  is  to  come  up  with  the 
correct  answer  to  a  20  questions  game.  Let  the  rath  player’s  query 
at  time  n  be  “does  A*  lie  in  the  region  A ^  C  RdT.  We  denote 
this  query  as  the  binary  variable  —  I(X*  £  A^)  £  {0, 1} 
to  which  the  player  yields  provides  a  noisy  response  Y„™1  £  {0, 1}. 

Define  the  M-tuples  Yn+i  =  (Y^_\, . . . ,  Y^+i )  and  An  = 
r  4(1)  AM)' I 

Assumption  1.  We  assume  that  the  players'  responses  are  condi¬ 
tionally  independent: 

M 

P{ Yn+1  =  y|A n,X\pn)  =  n  P(Y%1  =  y(m)\A(nm\x*,pn) 

m=  1 

(l) 

where 

P(Y%!  =  y(m)\Alr\x*,pn)  =  f[m\y^\A^\pn)I{X*  €  A^) 
+  tim)(y(rn)\Alm\pn)I(X*  $  A <7>).  (2) 

Assumption  2.  We  model  the  players  *  responses  as  binary  symmet¬ 
ric  channels  (BSC)  [4]  with  crossover  probabilities  em  £  (0, 1/2). 
Therefore  the  conditional p.m.f  =  P(Yim^  =  j\A™\pn)  of 
the  response  of  the  M-th  player  can  be  written: 

f^\y(m)\At\Pn)  =  (1  -em)I(y^  =  j)+emI(y^  =  1  -j) 
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Fig.  1.  Joint  scheme  for  M  collaborative  players  responding  to  binary  val¬ 
ued  queries  about  the  location  X *  of  an  unknown  target.  At  time  n,  the 
controller  chooses  the  queries  I(X*  E  A based  on  the  posterior  pn. 

Then,  the  M  players  yield  responses  that  are  fed  into  the  fusion  cen¬ 

ter,  where  the  posterior  is  updated  and  fed  back  to  the  controller  at  the  next 
time  instant  n  +  1.  The  target  location  estimate  Xn  is  the  median  of  the 
posterior  pn . 


where  m  —  1, . . . ,  M,  j  —  0, 1. 


Fig.  2.  Sequential  scheme  for  M  collaborative  players  responding  to  binary 
valued  queries  about  the  location  X*  of  an  unknown  target.  At  time  n,  the 
first  controller  chooses  the  query  I(X*  E  A ^ )  based  on  the  posterior  pn. 
Then,  player  1  yields  the  response  Y^+i  that  is  used  to  update  the  posterior, 

(2) 

and  the  second  controller  chooses  the  next  query  I(X*  E  A ^  ')  for  player 
2  based  on  the  updated  posterior,  etc. 


2.1.  Optimal  Joint  Query  Design 

We  consider  a  similar  setting  as  in  [1],  which  applied  to  the  M  = 
1  player  case,  but  now  we  have  a  joint  controller  that  chooses  M 
queries  A ^  at  time  n.  The  system  block  diagram  is  shown  in  Fig. 
1.  Define  the  set  of  subsets  of  Rd: 

(  M 

*(A(1\...,A(M))  =  l 

L  m  =  l 

where  (A)0  :=  Ac  and  (A)1  :=  A.  The  cardinality  of  this  set  of 
subsets  is  2M  and  these  subsets  partition  Rd.  The  objective  is  to 
localize  the  target  within  a  subset  A^m\ 

Define  the  density  parameterized  by  A n,Pn,  ii,  ■  ■  ■ ,  im- 

M 

9h-iM(y('1\  ■  •  •  ,2/(M)|A„,pn)  :=  JJ  fi^iy^lA^^n) 

m= 1 

where  ij  E  {0, 1}. 


By  integrating  over  x  E  Rd,  we  have: 

P(  Yn+i  y|An,pn)  E[P(Yn+i  y|An,  X*,pn)] 

(y  l-^-n  5  Pn  )  Pn 

*l:iM=  0 

The  difference  between  the  entropy  at  time  n  and  the  predicted  en¬ 
tropy  at  time  n  +  1  is  the  mutual  information  (/(X;  Y  \  A,  p)  denotes 
the  conditional  mutual  information  of  random  variables  X  and  Y 
given  random  variables  A  and  p.)\ 

H(p„)  -  E[H(pn+1)\An,pn]  =  I(X*;Yn+1\A-n,Pn) 

=  ff(Yn+i|A„,pn)  -  E[H(Yb+i)|X*,  An,pn\, 

which  is  equal  to  the  argument  of  the  “sup”  on  the  right  hand  side  of 
(3).  Here  we  used  (5)  and  (4).  Using  dynamic  programming  similar 
to  Thm.  2  in  [1],  it  follows  that  optimal  queries  satisfy  (3).  □ 


Theorem  1.  ( Joint  Optimality  Conditions )  Under  Assumption  1,  an 
optimal  joint  policy  that  minimizes  the  Shannon  entropy  of  the  pos¬ 
terior  distribution  pn  achieving  the  following  entropy  loss: 


G* 


sup 


{" 


E 

t  *1  :*M=0 


9ii 


(  H  Anm) 


m=l 


1  M 

-  Y  H(9n:iM)Pn(r\Xrr))im)},  O) 

il  :?m=0  m= 1 

where  H(f)  is  the  Shannon  entropy  of  the  p.m.f  f. 

Proof  Using  (1)  and  (2),  we  have: 

P(Yn+i  y|An,X*  x,pn) 

1  /  M 

=  Y  9ii-iM(y\An,Pn)I  I  x  e  Pi  (A(Zl)) 
n:*M= 0  V  m= 1 


Thm.  1  generalizes  the  bisection  policy  [1, 2]  to  multiple  players. 
The  fusion  rule  is  a  posterior  update  and  by  Bayes  rule: 

Pn+l(x)  OC  P(Yn  +  i  yn+i|An,X*  X,pn)  X  pn(x) 

where  yn+i  E  {0, 1}M  are  the  observations  at  time  n. 

2.2.  Greedy  Sequential  Query  Design 

As  an  alternative,  we  consider  the  following  greedy  sequential 
coordinate-by-coordinate  design:  ask  an  optimal  query  to  the  first 
player,  then  update  posterior  density  and  ask  an  optimal  query  to  the 
second  player,  and  so  on  (see  Fig.  2).  In  [1],  the  optimal  query  of  a 
single  player  was  given  as  a  bisection  rule.  We  show  that  this  greedy 
sequential  scheme  achieves  the  same  expected  entropy  loss  as  the 
optimal  joint  design  of  Thm.  1 . 


(4) 


0  %  %  % 


Fig.  3.  Jointly  optimal  queries  under  uniform  prior  for  two  dimensional 
target  search.  The  target  A*  is  indicated  by  a  black  square.  The  one-player 
bisection  rule  (left)  satisfies  the  optimality  condition  (7)  with  optimal  query 
A (1)  =  [0,  -j=  ]  x  [0,  ^=].  The  two-player  bisection  rule  (right)  satisfies 

(7)  with  optimal  queries  A^)  =  [0,  §]  x  [0,  |J  U  [±,  §]  x  [±,  |],  A^2)  = 
[| ,  1]  x  [| ,  1]  U  [| ,  |]  x  [| ,  |].  We  note  that  using  the  policy  on  the  left,  if 
player  1  responds  that  A*  E  [0,  x  [0,  -^],  with  high  probability,  then 
the  posterior  will  concentrate  on  that  region.  When  using  the  policy  on  the 
right,  if  player  1  and  2  respond  that  A*  E  A^^nA^  with  high  probability, 
then  the  posterior  will  concentrate  more  on  the  intersection  of  the  queries, 
thus  better  localizing  the  target  as  compared  with  the  single  player  policy. 


Theorem  2.  (Separation)  Under  Assumptions  1  and  2: 

1.  The  expected  entropy  loss  under  an  optimal  joint  query  design 
is  the  same  as  the  greedy  sequential  query  design.  This  loss  is 
given  hy: 


c  =  E  =  E  “  M^)) 


(6) 


where  hb(-)  is  the  binary  entropy  function  [4]. 
2.  All  jointly  optimal  control  laws  satisfy: 


L 


pn(x)dx  =  2  , \/R  E  7f(An). 


(7) 


Proof  Let  Gseq  denote  the  expected  entropy  loss  after  querying  M 
players  sequentially.  The  bisection  policy  yields  an  expected  entropy 
loss  of  C(em)  —  1  ~  hb(em)  1  after  querying  the  rath  player  [1]. 
Thus,  Gseq  —  X^m=i  G  (em)  •  Since  the  joint  controller  is  the  opti¬ 
mal  controller,  we  have  Gseq  <  G* .  To  finish  the  proof,  we  show 
Gseq  >  G*.  FromThm.  1, 


G* 


sup 

A(i),..,A(W) 


{" 


1  M 

-  E  H  Pn(  n(4m)fm)} 

*1:*M=0  m— 1 

<  swp{H(pTg )  -  p TH(g)  :  p  >  0,  lTp  =  1}  (8) 

P 


=  G 


seq 


where  the  last  equality  follows  by  the  symmetry  of  BSC.  The  supre- 
mum  in  the  strictly  concave  problem  (8)  is  achieved  by  the  uniform 
distribution.  □ 

Ahis  is  the  channel  capacity  of  the  mth  BSC  [1,4]. 


Thm.  2  shows  that  the  optimal  policy  can  be  determined  and 
implemented  using  the  simpler  greedy  sequential  query  design. 
Note  that,  despite  the  fact  that  all  players  are  conditionally  inde¬ 
pendent,  the  joint  policy  does  not  decouple  into  separate  single 
player  optimal  policies.  This  is  analogous  to  the  non- separability  of 
the  optimal  vector-quantizer  in  source  coding  even  for  independent 
sources  [5].  In  addition,  the  optimal  queries  must  be  overlapping- 
i.e.,  fjm=i  0,  but  not  identical.  Finally,  we  remark  that  the 

optimal  query  An  is  not  unique,  so  it  is  possible  that  there  exists  an 
even  simpler  control  law  than  the  sequential  greedy  policy. 

3.  LOWER  BOUNDS  ON  MSE  VIA  ENTROPY  LOSS 

Thm.  2  yields  the  value  of  the  20  questions  game  in  terms  of  ex¬ 
pected  entropy  reduction,  which  is  the  sum  of  the  “capacities”  2  of 
all  the  players.  This  value  function  is  used  next  to  provide  a  lower 
bound  on  the  MSE  of  the  sequential  Bayesian  estimator. 

Theorem  3.  Let  Assumptions  1,2  hold.  Assume  H(jpf)  is  finite. 
Then,  the  MSE  of  the  joint  or  sequential  query  policies  in  Thm  1 
and  2  satisfies : 


Kde_2nC/d  <  x*  _  Xn  Hi]  (9) 

z7re 

where  K  —  e2H(<P0\  C  is  the  entropy  loss  given  in  (6)  and  Xn  is 
the  posterior  median. 

Observe  that  the  bound  in  (9)  is  uniform  over  all  policies  n. 
The  bound  is  met  with  equality  if  the  optimal  policy  is  used, 
the  estimation  error  is  Gaussian  with  covariance  Kn ,  the  tar¬ 
get  estimate  is  taken  as  the  conditional  mean,  E n[H(pn)]  = 
log((27re)d  det(E7r [Kn]))  and  det(E7r[A'„])  =  (5!>^n)l)d.  We 
finally  note  the  MSE  bound  behaves  exponentially  as  a  function  of 
the  number  of  queries  n.  The  proof  is  given  in  [6]. 

4.  UPPER  BOUNDS  ON  MSE 

The  performance  analysis  of  the  bisection  method  is  difficult  primar¬ 
ily  due  to  the  continuous  nature  of  the  posterior  [2].  A  discretized 
version  of  the  probabilistic  bisection  method  was  proposed  in  [7], 
using  the  Burnashev-Zingagirov  (BZ)  algorithm,  which  imposes  a 
piecewise  constant  structure  on  the  posterior.  A  description  of  the 
BZ  algorithm  and  its  convergence  rate  is  given  in  [2]  (also  see  App. 
A  in  [8]).  For  simplicity  of  discussion,  we  assume  the  target  loca¬ 
tion  is  constrained  to  the  unit  interval  X  —  [0, 1].  A  step  size  A  >  0 
is  defined  such  that  A-1  E  N  and  the  posterior  after  j  iterations  is 
Pj  :  X  — R,  given  by 


Pj(x)  =  ^  E  Mi)-M  e  h) 

i=  1 

where  h  m  [0,  A],  U  =  ((z  —  1)A,  iA\  for  i  =  2, . . . ,  A-1.  The 
initial  posterior  is  ai(0)  =  A.  The  posterior  is  characterized  com¬ 
pletely  by  the  pseudo-posterior  a  (j)  =  [a±  (j), . . . ,  aA-i  ( j )]  which 
is  updated  at  each  iteration  via  Bayes  rule  [8]. 

Convergence  rates  were  derived  for  the  one-dimensional  case  in 
[2]  for  the  bounded  noise  case  (i.e.,  constant  error  probability)  and 

2The  “capacity”  of  each  player  is  the  channel  capacity  of  each  BSC  [4]. 


for  the  unbounded  noise  case  (i.e.,  error  probability  depends  on  dis¬ 
tance  from  target  X *  and  converges  to  1/2  as  the  estimate  reaches 
the  target)  in  [9].  A  modified  version  of  this  algorithm  that  is  proven 
to  handle  unbounded  noise  was  shown  in  [9].  Thm.  4  derives  upper 
bounds  on  MSE  using  ideas  from  [9].  The  proof  is  given  in  [6]. 

Theorem  4.  Consider  the  sequential  bisection  algorithm  for  M 
players  in  one -dimension,  where  each  bisection  is  implemented  us¬ 
ing  the  BZ  algorithm.  Then,  we  have: 

P( \x*  -  Xn\  >  A)  <  (1  —  1) exp  (-nC) 

E[(X*  -  Xnf ]  <  (2-2/3  +  21/3)  exp  [~\nCj  (10) 

where  C  =  J2m= 1  C(em),  C(e)  =  1/2  -  i/e(l  -  e). 

The  combination  of  the  lower  bound  (Thm.  3)  and  the  upper 
bound  (Thm.  4)  imply  that  the  MSE  of  the  BZ  algorithm  goes  to 
zero  at  an  exponential  rate  with  rate  constant  between  2 C  and  2/3 C. 


5.  HUMAN-IN-THE-LOOP 


In  this  section,  we  consider  a  particular  2-player  case  where  player 
1  (the  machine)  has  a  constant  error  probability  c\  £  (0, 1/2)  and 
player  2  (the  human)  has  error  probability  depending  on  the  target 
localization  error  after  the  most  recent  query: 

PiY^y  =  y{2)\Z(2)  =  1  -y(2))  =l--m.m{5o^\X*-Xn\K~1) 

(11) 

where  n>  1 , 0  <  £o  <  p  <  1/2  is  a  reliability  parameter  to  param¬ 
eterize  the  human  3 .  This  is  a  popular  model  used  for  human-based 
optimization  [3]  and  has  also  appeared  in  the  unbounded  noise  case 
[9]  for  binary  classification.  From  the  nature  of  the  error  probabil¬ 
ity  (11),  we  expect  that  the  answers  provided  by  the  human  will  be 
helpful  in  the  beginning  iterations  but  their  value  will  go  to  zero  as 
the  number  of  iterations  grows  to  infinity.  This  is  because  the  human 
propensity  for  error  becomes  larger  as  the  questions  become  more 
highly  resolved. 

Using  a  similar  technique  as  in  the  proof  of  Thm.  4,  and 
using  the  modified  BZ  algorithm  [9],  from  Lemma  1  in  [9],  we 
have  the  following.  For  n  >  2  with  — =,  0:2  = 

VelTV  1  el 

0.09/i(3A/4)re_1: 


P( \X*-Xn\  >  A)  <  A”1  exp  (-n  C(e i)  +  ^ 

This  leads  to  the  MSE  upper  bound  dependent  on  A: 


E[(X*-Xn)2]  <  A2+A_1  exp  (  -n 


<?(ei)+M(x)2K"2 


With  the  choice  A  =  2~ll3e~nG{tx)l3 , 


(12) 


E[(X*  -  Xn)2]  <  exp  --nC(ei) 


2-2/3  +  2i/s  exp  ( 

oU 


'3  •  2_1/3\2fc-2  _nC(ei)^2 


(13) 


3  The  parameter  k  controls  the  “resolution”  of  the  human.  It  becomes  increasingly 
difficult  for  the  human  to  decide  between  close  hypotheses  as  k  goes  to  infinity. 


which  is  no  greater  than  the  “player  1”  MSE  bound  (compare  (13) 
with  (10)).  Asymptotically  as  n  oo,  the  two  bounds  both  con¬ 
verge  to  zero  at  the  same  rate. 

We  define  the  human  gain  ratio  (HGR)  as  the  ratio  of  MSE  upper 
bounds  associated  with  “player  1”  alone  and  “player  1  +  human”, 
respectively,  given  by 

=  _ 2~2/3  +  21/3 _ 

2-2/3  21/3  exp  5d~(3  241/3  )2K~2ne~n^^ei^^~ ^ 

(14) 

The  HGR  is  plotted  in  Fig.  4  as  a  function  of  k.  This  analysis  quan¬ 
tifies  the  value  of  including  the  human-in-the-loop  for  a  sequential 
target  localization  task.  We  note  that  the  larger  a  is,  the  larger  is  the 
HGR.  Also,  as  k  decreases  to  1,  the  ratio  increases,  meaning  that  the 
value  of  including  the  human  in  the  loop  increases. 


e1  =  0.4 


Fig.  4.  Human  gain  ratio  (see  Eq.  (14))  as  a  function  of  k.  The  human 
provides  the  largest  gain  in  the  beginning  few  question  iterations  and  the 
additional  contribution  of  the  human  decreases  as  n  — >•  00.  The  circles  are 
the  predicted  curves  according  to  (13),  while  the  solid  lines  are  the  optimized 
versions  of  the  bound  (12)  as  a  function  of  A  for  each  n.  The  predictions 
match  well  the  optimized  bounds  predicted  by  the  theory  in  Section  5. 

Monte  Carlo  simulation  experiments  are  given  in  [6],  where  it 
is  observed  that  employing  a  human  in  the  loop  reduces  the  MSE 
relative  to  only  having  player  1 .  It  is  also  noted  that  the  gap  between 
the  MSE  curves  associated  with  “player  1”  and  “player  1  +  human” 
initially  increases  to  a  maximum  and  then  diminishes  to  zero.  This 
could  be  used  to  motivate  a  stopping  rule  for  including  the  human 
when  the  cost  of  using  the  human  is  increasing  over  time  or  human 
fatigue  prevents  repeated  querying;  a  worthwhile  subject  of  future 
work. 


6.  CONCLUSION 

We  studied  the  problem  of  collaborative  20  questions  with  noise  for 
the  multiplayer  case.  We  derived  a  separation  theorem  that  shows 
the  jointly  optimal  design  is  equivalent  to  a  greedy  sequential  de¬ 
sign  that  can  be  more  easily  implemented.  Using  this  framework, 
we  obtained  bounds  for  the  performance  of  human-in-the-loop  tar¬ 
get  localization  systems.  Future  work  includes  integration  of  a  noisy 
continuous  valued  sensor  and  controlling  the  human  query  rate. 
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